Survey Data Analysis Case Study

Bethany Gardner

2024-08-29

Why this project?

  • Using one of my dissertation experiments as a case study (published as GitHub Repository and Quarto Book)
  • I’ve chosen to talk about this one because it involves the most data preprocessing/management steps, and I was in charge of all of them

About the experiment

  • Studying language learning and processing mechanisms for singular they
  • Including pronouns on nametags and in introductions are common recommendations for creating a more gender-inclusive environment. We know it can affect people’s perception of an environment, but does it also affect people’s language use?
  • Participants:
    • Learned about a set of fictional characters (he/him, she/her, and they/them)
    • Nametag condition: Varied whether the introductions to the character explicitly stated their pronouns (This is Alex, who uses they/them pronouns. They…)
    • Introduction condition: Varied whether the nametags included pronouns
    • Speech production task eliciting possessive pronouns (Alex gave the apple to their brother.)
    • Survey about their demographics, experience with singular they, and attitudes about singular they

About the data

  • Audio data transcribed and annotated for which pronouns were produced & survey data for each participant
  • Do the nametag and introduction conditions affect accuracy producing singular they?
  • If production accuracy is internally reliable, is it predicted by demographics, language attitude, or language experience measures?

Pipeline overview

Power analysis

Create a data structure with the structure of the proposed experiment, and estimate fixed and random effect sizes based on prior experiments.

# get Pronoun * PSA interaction from Exp2 production model
load("r_data/exp2.RData")

exp2_r_effect_size <- exp2_m_prod@model |>
  tidy() |>
  filter(term == "Pronoun=They_HeShe:PSA=GenLang") |>
  pull(estimate) |>
  round(2)

exp2_r_effect_size       # log-odds
exp(exp2_r_effect_size)  # odds ratio

# start with 108 participants each doing 30 trials
exp3_pw_data_struct <- data.frame(
  Participant = rep(as.factor(1:108), each = 30),
  Trial = rep(as.factor(1:30), 108)
)

# Trials are split between 3 Pronoun Pair conditions, which are contrast-coded
# to compare:
# (1) They|HeShe vs HeShe|They + HeShe|SheHe
# (2) HeShe|They vs HeShe\|SheHe
exp3_pw_data_struct <- exp3_pw_data_struct |>
  bind_cols(
    "Pronoun" = rep(rep(factor(c("He", "She", "They")), each = 10), 108)
  )
contrasts(exp3_pw_data_struct$Pronoun) <- cbind(
  "_T vs HS" = c(.33, .33, -.66),
  "_H vs S"  = c(-.5, .5, 0)
)

# Nametag and Introduction conditions vary in a 2x2 between-P design, and both
# are mean-centered effects coded.
exp3_pw_data_struct <- exp3_pw_data_struct |>
  bind_cols(
    "Nametag" = rep(rep(factor(c(0, 0, 1, 1)), each = 30), 108 / 4),
    "Intro" = rep(rep(factor(c(0, 1, 0, 1)), each = 30), 108 / 4)
  )

contrasts(exp3_pw_data_struct$Nametag) <- cbind("_No_Yes" = c(-.5, .5))
contrasts(exp3_pw_data_struct$Intro) <- cbind("_No_Yes" = c(-.5, .5))

# Item is defined as each unique image-name-pronoun combination. There are 6
# sets of characters, and each list sees 3, making 18 unique characters.
exp3_pw_data_struct <- exp3_pw_data_struct |>
  bind_cols(
    "Character" = rep(as.factor(1:18), each = 30 / 3, 108 / 6)
  )
str(exp3_pw_data_struct)

exp3_pw_data_struct |>
  group_by(Nametag, Intro) |>
  summarise(n_distinct(Participant))

# The closest thing to existing data is the Exp2 (written) production task.
# Since interpreting effect sizes is apparently more complicated for logistic
# regression, let's go with the Exp2 results as a baseline. That's a rough
# estimate of how much harder they/them is to produce than he/him and she/her.
# And let's set the hypothetical Nametag and Introduction effects to be about
# the same size as the PSA. Hopefully that's small enough to be kind of
# conservative with the power analysis, but not aiming for effects too small to
# be practically relevant.
exp2_m_prod_fixed <- exp2_m_prod@model |>
  tidy() |>
  filter(effect == "fixed") |>
  select(term, estimate)
exp2_m_prod_fixed

# Predictions for Exp3 based on ranges from Exp2:
exp3_pw_fixed <- c(
  +0.75,  # Intercept                    Medium
  +3.00,  # Pronoun: T vs HS             Largest
  -0.10,  # Pronoun: H vs S              NS, maybe small
  +0.10,  # Nametag                      NS, maybe small
  +0.10,  # Introduction                 NS, maybe small
  -2.00,  # Pronoun: T vs HS * Nametag   Same size as PSA interaction
  -0.10,  # Pronoun: H vs S  * Nametag   NS, maybe small
  -2.00,  # Pronoun: T vs HS * Intro     Same size as PSA interaction
  -0.10,  # Pronoun: H vs S  * Intro     NS, maybe small
  +0.25,  # Nametag * Intro              Maybe small
  -2.00,  # 3 way T vs HS                Same size as PSA interaction
  -0.10   # 3 way H vs S                 NS, maybe small
)

# The model for the Exp2 production task only converged with random intercepts
# by item, and no random effects by participant.
exp2_m_prod_random <- VarCorr(exp2_m_prod@model)

# The model for the Exp1 production task only converged with random intercepts
# and slopes by participant, and no random effects by item.
load("r_data/exp1.RData")
exp1_m_prod_random <- VarCorr(exp1a_m_prod@model)


# So, I'll combine those two as a starting place to estimate the random effects.
# It's possible the actual data won't converge with the maximal random effects
# structure, but for now let's assume it will.
exp3_pw_random <- exp1_m_prod_random
exp3_pw_random[["Item"]] <- exp2_m_prod_random[["Name"]]

# Create model with this data structure, fixed effects, and random effects
exp3_pw_m_108 <- makeGlmer(
  formula = SimAcc ~ Pronoun * Nametag * Intro +
    (Pronoun | Participant) + (1 | Character),
  family = binomial,
  fixef = exp3_pw_fixed,
  VarCorr = exp3_pw_random,
  data = exp3_pw_data_struct
)
summary(exp3_pw_m_108)

Power analysis

Use {simr} (Green and MacLeod 2016) to simulate the power for each effect (Pronoun × Nametag/Intro, Pronoun × Nametag × Intro) at 108, 132, 156, and 180 participants.

# Simulate data
exp3_pw_sim_data <- doSim(exp3_pw_m_108)
exp3_pw_data_struct <- exp3_pw_data_struct |>
  bind_cols("SimAcc" = exp3_pw_sim_data)

summary(exp3_pw_data_struct)

# Code to run simulation:
powerSim(
  exp3_pw_m_108,
  nsim = 1000,
  test = fixed("Pronoun_T vs HS:Nametag_No_Yes", "z")
)

# Then extend model to larger N
exp3_pw_m_132 <- extend(exp3_pw_m_108, along = "Participant", n = 132)

# Load and join results
exp3_pw_results <- bind_rows(
    .id = "sim",
    "2_108" = readRDS("r_data/exp3_power_2way_N108.RDA") |> summary(),
    "2_132" = readRDS("r_data/exp3_power_2way_N132.RDA") |> summary(),
    "2_156" = readRDS("r_data/exp3_power_2way_N156.RDA") |> summary(),
    "2_180" = readRDS("r_data/exp3_power_2way_N180.RDA") |> summary(),
    "3_132" = readRDS("r_data/exp3_power_3way_N132.RDA") |> summary(),
    "3_156" = readRDS("r_data/exp3_power_3way_N156.RDA") |> summary()
  ) |>
  mutate(
    n_participants = str_sub(sim, 3),
    effect = case_when(
      str_sub(sim, 0, 1) == "2" ~ "Pronoun * Nametag/Intro",
      str_sub(sim, 0, 1) == "3" ~ "Pronoun * Nametag * Intro"
    )
  ) |>
  column_to_rownames(var = "sim")

Power analysis

  • We determined that 156 participants, each completing 30 trials, would have 0.93 [0.91, 0.94] power at α = .05 to detect the two-way interactions (Pronoun × Nametag/ Introduction).
  • Note that in cognitive psychology, the goal is have enough statistical power to detect differences between experimental conditions, not necessarily to be able to generalize differences between groups of participants to the entire population.
  • We can get a decently representative sample of respondents from Prolific, but didn’t do population weights.

Pipeline overview

Audio data (AWS S3)

  • PCIbex, our experiment platform, sends the audio data to an AWS S3 bucket
  • It’s most efficient to just download the data from S3 once and run the rest of the analyses locally, instead of querying it from S3 every time
  • Bash script to download new data; check that an audio file for each trial for each participant exists as expected; then unzip, convert, and sort audio files

Audio data (AWS S3)

# Options:
#   s   sync data from AWS
#   p   check participant list
#   z   unzip and sort audio files
#   c   run tests on PCIbex output and audio file names
#   t   transcribe


while getopts "spzct" option; do
  case $option in
    s)  # Sync audio data from S3
        echo "Getting data from AWS"
        cd ../data/s3/
        aws s3 sync s3://they3 .
        cd ../../preprocessing/
        ;;
    p)  # Get list of participants from PCIbex data to update participant list
        echo "Checking audio data to see what needs to be added to the participant list"
        Rscript participant_list.R
        ;;
    z)  # Unzip the audio data and convert it to WAV files in dirs for each participant
        echo "Unzipping, converting, and sorting the audio files"
        python s3_to_wav.py
        ;;
    c)  # Check output
        echo "Checking the audio file names against the PCIbex data"
        Rscript check_output.R
        ;;
    t)  # Transcribe
        echo "Transcribing"
        python transcribe.py
        ;;
  esac
done

Transcribe using whisper

  • First pass for transcription using the whisper model (Radford et al. 2022)
  • Pros: fairly quick, runs locally and does not get copy of identifiable data
  • Cons: does not include speech errors and disfluencies

Transcribe using whisper

import os
import whisper
import pandas as pd
from pathlib import Path


# ---- Helper functions ----- #
def make_transcription_df():
    """Set up df for transcription data.

    Returns:
        df: columns for `participant_id`, `prolific_id`, `trial_id`, indexed by
            `file_path`
    """    
    transcriptions = []
    participant_dirs = [
        p for p in audio_dir.iterdir()
        if not p.match("*temp*") and not p.match("*incomplete*")
    ]

    for p in participant_dirs:
        audio_list = [a.stem for a in p.glob('*.wav')]
        trials = [get_trial_info(p.name, a) for a in audio_list]
        df = pd.DataFrame(
            trials,
            columns=['file_path', 'participant_id', 'prolific_id', 'trial_id'],
        )
        df = df.set_index('file_path').sort_values(by='trial_id')
        transcriptions.append(df)
    return transcriptions


def get_trial_info(p_dir, file_name):
    """Get trial info from the name of the audio file.

    Args:
        p_dir (str): dir for participant data
        file_name (str): audio file within participant's data dir 

    Returns:
        list: .wav file name (Path), participant ID (str), prolific ID (str),
            and trial ID (str)
    """    
    participant_id, prolific_id = p_dir.split('_')
    trial_id = file_name.removeprefix(prolific_id + '_').removesuffix('.wav')
    return [
        audio_dir / p_dir / f"{file_name}.wav",
        participant_id, prolific_id, trial_id
    ]


def run_whisper_on_participant(df, model):
    """Use whisper to transcribe a trial.

    Args:
        df (df): structure for transcription data from `make_transcription_df()`,
            which has `participant_id` as the first column and is indexed by the
            path to the audio file
        model (whisper model): whisper model loaded (using small English-only)
    """    
    participant_id = df.iloc[0, 0]
    file_path = text_dir / f"{participant_id}_whisper.csv"
    if not os.path.exists(file_path):
        print(participant_id)
        df['text'] = df.index.map(lambda t: whisper.transcribe(model, str(t))['text'])
        print(df['text'])
        df.to_csv(os.path.join(file_path))
     

# ---- Main function ----- #
def transcribe_trials():
    """Main function to transcribe .wav files using whisper."""    
    transcriptions = make_transcription_df()
    model = whisper.load_model('medium.en')
    for p in transcriptions:
        run_whisper_on_participant(p, model)
        
    return transcriptions


# ---- Run ----- #
audio_dir = Path('..') / 'data' / 'exp2_audio'
text_dir = Path('..') / 'data' / 'exp2_transcription'

transcribe_trials()

Check transcriptions

  • RA listened to audio and added back in disfluencies
  • Coded for which pronouns are produced; accuracy determined by final pronoun
participant id condition nametag intro pronoun pair target pronoun target id distractor pronoun trial id transcription he his she her they their disfluency multiple pronouns pronoun produced accuracy
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical01_he Taylor gave the chocolate to his brother. 0 1 0 0 0 0 1 0 his 1
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical02_he Taylor gave the cherries to their brother. 0 0 0 0 0 1 0 0 their 0
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical03_he Taylor gave the avocado to his brother. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical04_he Taylor gave the pumpkin to his brother. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical05_he Sam gave the bread to his--Taylor gave the bread to his sister. 0 1 0 0 0 0 1 0 his 1
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical06_he Taylor gave the balloon to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical07_he Taylor gave the cards to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_HS he 10 she nametag_list4_critical08_he Taylor gave the glasses to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical09_he Taylor gave the corn to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical10_he Taylor gave the kiwi to his brother. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical11_he Taylor gave his brother grapes. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical12_he Taylor gave the pear to his brother. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical13_he Taylor gave the wa--orange juice-- 0 0 0 0 0 0 1 0 none NA
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical14_he Taylor gave the yellow cap to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical15_he Taylor gave the scissors to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_T he 10 they nametag_list4_critical16_he Taylor gave the suitcase to his sister. 0 1 0 0 0 0 0 0 his 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical17_she Jordan gave the spoon to her brother. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical18_she Jordan gave the broccoli to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical19_she Jordan gave her brother an egg. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical20_she Jordan gave the strawberry to her brother. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical21_she Jordan gave the pineapple to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical22_she Jordan gave the bucket to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical23_she Jordan gave the watch to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_HS she 11 he nametag_list4_critical24_she Jordan gave the guitar to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical25_she Jordan gave the banana to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical26_she Jordan gave the bacon to their brother--to her brother. 0 0 0 1 0 1 1 1 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical27_she Jordan gave the ice cream to her brother. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical28_she Jorin gave the carrot to her brother. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical29_she Jordan gave the lemon to her brother. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical30_she Jordan gave the stuffed animal to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical31_she Jordan gave the rose to her sister. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 HS_T she 11 they nametag_list4_critical32_she Jordan gave the water bottle to her brother. 0 0 0 1 0 0 0 0 her 1
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical33_they Sam gave the pizza to their brother. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical34_they Sam gave the plate to their brother. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical35_they Sam gave the orange to her brother. 0 0 0 1 0 0 1 0 her 0
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical36_they Sam gave the potato to her--their brother. 0 0 0 1 0 1 1 1 their 1
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical37_they Sam gave the apple to their sister. 0 0 0 0 0 1 1 0 their 1
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical38_they Sam gave the brown bag to her sister. 0 0 0 1 0 0 0 0 her 0
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical39_they Sam gave the teddy bear to their sister. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 she nametag_list4_critical40_they Sam gave the paintbrush to their sister. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical41_they Sam gave the mushroom to their sister. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical42_they Sam gave the onion to their brother. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical43_they Sam gave their brother a watermelon. 0 0 0 0 0 1 1 0 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical44_they Sa--Sam gave the knife to her brother--to their brother. 0 0 0 1 0 1 1 1 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical45_they Sam gave the cookie to their brother. 0 0 0 0 0 1 0 0 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical46_they Sam gave the tomato to their sister. 0 0 0 0 0 1 1 0 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical47_they Sam gave the pencil to her sister--their sister. 0 0 0 1 0 1 1 1 their 1
P202 nametag 1 0 T_HS they 12 he nametag_list4_critical48_they Sam gave the soccer ball to their sister. 0 0 0 0 0 1 0 0 their 1

Check transcriptions

Tests to check coding against regex and for completion

library(here)
library(tidyverse)
library(testthat)
library(readxl)


# Load data from coded CSVs----
df <- list.files(
  path = "data/exp2_coding",
  pattern = "*coded.csv",
  full.names = TRUE
) |>
  set_names() |>
  map(read.csv) |>
  list_rbind(names_to = "participant_id") |>
  mutate(participant_id = str_sub(str_split_i(participant_id, "/", 3), 0, 4)) |>
  mutate(
    .before = trial_id,
    character_list = str_remove(str_split_i(trial_id, "_", 2), "list")
  ) |>
  mutate(across(
    c(ends_with("id"), contains("pronoun"), condition, character_list),
    as.factor
  )) |>
  mutate(across(
    starts_with("transcription"),
    \(x) {
      case_when(
        x == "" ~ NA,
        x == " ." ~ NA,
        .default = str_replace(str_trim(x), "Jamie", "Jaime")  # fix spelling
      )
    }
  )) |>
  mutate(drop_trial = case_when(  # don't drop trial if sentence is cut off
    str_detect(transcription_manual, "Alex") ~ 0,
    str_detect(transcription_manual, "Casey") ~ 0,
    str_detect(transcription_manual, "Jaime") ~ 0,
    str_detect(transcription_manual, "Jordan") ~ 0,
    str_detect(transcription_manual, "Sam") ~ 0,
    str_detect(transcription_manual, "Taylor") ~ 0,
    .default = drop_trial
  )) |>
  filter(str_starts(participant_id, "P"))  # drop incompletes


# Calculate pronoun produced and accuracy----
df2 <- df |>
  mutate(
    sum = his + her + their,
    multiple_pronouns = ifelse(sum > 1, 1, 0),
    he_loc = ifelse(
      multiple_pronouns == 1 & he == 1,
      str_locate(str_to_lower(transcription_manual), "\\bhe"), 0
    ),
    his_loc = ifelse(
      multiple_pronouns == 1 & his == 1,
      str_locate(str_to_lower(transcription_manual), "\\bhi"), 0
    ),
    she_loc = ifelse(
      multiple_pronouns == 1 & she == 1,
      str_locate(str_to_lower(transcription_manual), "\\bshe\\b"), 0
    ),
    her_loc = ifelse(
      multiple_pronouns == 1 & her == 1,
      str_locate(str_to_lower(transcription_manual), "\\bher\\b"), 0
    ),
    they_loc = ifelse(
      multiple_pronouns == 1 & they == 1,
      str_locate(str_to_lower(transcription_manual), "\\bthey\\b"),
      0
    ),
    their_loc = ifelse(
      multiple_pronouns == 1 & their == 1,
      str_locate(str_to_lower(transcription_manual), "\\bthei"),
      0
    ),
    max_loc = ifelse(
      multiple_pronouns == 1,
      pmax(he_loc, his_loc, she_loc, her_loc, they_loc, their_loc),
      NA
    ),
    pronoun_produced = factor(
      case_when(
        sum == 0 ~ "none",
        multiple_pronouns == 0 & (his == 1 | he == 1) ~ "his",
        multiple_pronouns == 0 & (her == 1 | she == 1) ~ "her",
        multiple_pronouns == 0 & (their == 1 | they == 1) ~ "their",
        multiple_pronouns == 1 & max_loc == he_loc ~ "his",
        multiple_pronouns == 1 & max_loc == his_loc ~ "his",
        multiple_pronouns == 1 & max_loc == she_loc ~ "her",
        multiple_pronouns == 1 & max_loc == her_loc ~ "her",
        multiple_pronouns == 1 & max_loc == they_loc ~ "their",
        multiple_pronouns == 1 & max_loc == their_loc ~ "their",
        is.na(max_loc) & participant_id == "P329" &
          str_detect(transcription_manual, "chocolate") ~ "his",
        is.na(max_loc) & participant_id == "P492" &
          str_detect(transcription_manual, "pencil") ~ "their"
      ),
      levels = c("his", "her", "their", "none")
    ),
    accuracy = case_when(
      pronoun_produced == "none" ~ NA,
      as.character(pronoun_produced) == "his" &
        as.character(target_pronoun) == "he" ~ 1,
      as.character(pronoun_produced) == "her" &
        as.character(target_pronoun) == "she" ~ 1,
      as.character(pronoun_produced) == "their" &
        as.character(target_pronoun) == "they" ~ 1,
      .default = 0
    )
  )

# Drop trials with no data, drop calculation/extra columns, add item ID----
df3 <- df2 |>
  filter(drop_trial == 0 & !is.na(transcription_manual)) |>
  rename(transcription = transcription_manual) |>
  select(
    -prolific_id, -correct_description, -transcription_whisper, -name_only,
    -drop_trial, -notes, -sum, -ends_with("loc")
  )

character_lists <- read_excel(
  path = here("materials", "exp2_stimuli.xlsx"),
  sheet = "Character Sets",
  skip = 2,
  col_types = c("skip", rep("text", 7), rep("skip", 9))
) |>
  rename_with(str_to_lower) |>
  mutate(across(everything(), as.factor))

df3 <- df3 |>
  left_join(
    character_lists |> select(list, item, pronouns),
    by = join_by(character_list == list, target_pronoun == pronouns),
    relationship = "many-to-one"
  ) |>
  relocate(item, .after = target_pronoun) |>
  rename(target_id = item) |>
  select(-character_list)


# Checks----
test_that("No NAs in pronoun variables", {
  expect_false(any(is.na(df3$he)))
  expect_false(any(is.na(df3$his)))
  expect_false(any(is.na(df3$she)))
  expect_false(any(is.na(df3$her)))
  expect_false(any(is.na(df3$they)))
  expect_false(any(is.na(df3$their)))
  expect_false(any(is.na(df3$name_only)))
})

test_that("Pronoun variables match regex", {
  df_test <- df3 |>
    mutate(transcription = str_to_lower(transcription)) |>
    select(participant_id, transcription, he, his, she, her, they, their) |>
    mutate(
      has_he = case_when(
        str_detect(transcription, "\\bhe\\b") ~ 1,
        .default = 0
      ),
      has_his = case_when(
        str_detect(transcription, "\\bhis\\b") ~ 1,
        str_detect(transcription, "\\bhi-") ~ 1,
        .default = 0
      ),
      has_she = case_when(
        str_detect(transcription, "\\bshe\\b") ~ 1,
        .default = 0
      ),
      has_her = case_when(
        str_detect(transcription, "gave her a") ~ 0,
        str_detect(transcription, "her glasses back") ~ 0,
        str_detect(transcription, "pencil to h--") ~ 1,
        str_detect(transcription, "\\bher\\b") ~ 1,
        .default = 0
      ),
      has_they = case_when(
        str_detect(transcription, "\\bthey\\b") ~ 1,
        .default = 0
      ),
      has_their = case_when(
        str_detect(transcription, "\\btheir\\b") ~ 1,
        str_detect(transcription, "to thei-") ~ 1,
        str_detect(transcription, "to theirs") ~ 1,
        str_detect(transcription, "chocolate to th--") ~ 1,
        .default = 0
      )
    )

  expect_true(all(df_test$he == df_test$has_he))
  expect_true(all(df_test$his == df_test$has_his))
  expect_true(all(df_test$she == df_test$has_she))
  expect_true(all(df_test$her == df_test$has_her))
  expect_true(all(df_test$they == df_test$has_they))
  expect_true(all(df_test$their == df_test$has_their))
})

test_that("All multiple pronoun trials coded as disfluencies and final coded", {
  df_dis <- df3 |>
    filter(multiple_pronouns == 1 & disfluency == 0) |>
    filter(!(he == 1 & his == 1)) |>
    filter(!(she == 1 & her == 1)) |>
    filter(!(they == 1 & their == 1))
  expect_equal(nrow(df_dis), 0)

  expect_false(any(is.na(df3$pronoun_produced)))
})


# Counts----
participants_missing_trials <- df |>
  filter(!is.na(transcription_manual) & drop_trial == 0) |>
  summarise(.by = participant_id, n = n_distinct(trial_id))
sum(participants_missing_trials$n)

df3 |>
  summarise(
    .by = c(condition, target_pronoun),
    mean = mean(accuracy, na.rm = TRUE)
  ) |>
  arrange(target_pronoun)


# Export----
write_csv(df3, file = "data/exp2_pronouns.csv")

Pipeline overview

Text data

Survey data is written in log file format that needs to be parsed to remove irrelevant data, select the participant/condition-level data that I recorded at the beginning, and select the trial-level data that I recorded with each trial:

Preprocess

R code to wrangle survey data:

library(tidyverse)
library(janitor)
library(readxl)

# MAIN DF----
## Read PCIbex output----
d_survey <- list.files(path = "data/exp2_PCIbex/", full.names = TRUE) |>
  map_df(
    ~read.csv(., header = FALSE, fill = TRUE, col.names = paste("V", 1:26))
  ) |>
  filter(str_detect(V.1, "#") == FALSE) |>  # drop PennController comments
  select(V.13, V.6, V.9, V.15, V.10, V.11) |>  # drop PennController extra cols
  rename(  # name PennController output columns
    trial_type  = V.6,  # trial label in PCIbex
    trial_part  = V.9,  # type of trial data
    parameter   = V.10,
    response    = V.11,
    prolific_id = V.13,  # first trial variable saved is prolific_id
    trial_item  = V.15
  ) |>
  filter(str_detect(  # get demographics and familiarity questions
    trial_type, "demographics|sentences|they|transphobia"
  )) |>
  # Remove status update rows for them
  filter(parameter != "_Header_" & parameter != "_Trial_") |>
  filter(parameter != "First" & parameter != "Unselect") |>
  filter(parameter != "Status" & parameter != "Filename") |>
  filter(!(
    # remove rows that indicate last item selected in check box
    parameter == "Choice" &
      (
        trial_part == "enter_they" | trial_part == "enter_trans" |
          trial_part == "enter_sexuality" | trial_part == "enter_race"
      )
  )) |>
  filter(!(parameter == "Final" & response == ""))  # write-in box empty


## Match to Participant ID----
participant_list <- "data/participant_list.xlsx" |>
  read_xlsx(sheet = 1, range = cell_cols(1:10)) |>
  clean_names() |>
  select(ends_with("id"), condition) |>
  filter(!is.na(participant_id)) |>
  mutate(across(everything(), as.factor))

d_survey <- d_survey |>
  left_join(participant_list, by = "prolific_id") |>
  relocate(participant_id, .before = 1) |>
  select(-prolific_id, -condition)

## Exclusions----
d_survey <- d_survey |> filter(str_detect(participant_id, "P")) |> droplevels()


## Question categories & items----
d_survey <- d_survey |>
  mutate(
    .after = participant_id,
    category =
      case_when(
        trial_type == "rate_sentences" ~ "Sentence Naturalness Ratings",
        trial_type == "transphobia_scale" ~ "Transphobia Scale",
        trial_part == "enter_they" ~ "Familiarity With They/Them Pronouns",
        str_detect(trial_part, "intro|nametag") ~
          "Familiarity With Pronoun-Sharing Practices",
        str_detect(trial_part, "age") ~ "Age",
        str_detect(trial_part, "gender") ~ "Gender",
        str_detect(trial_part, "trans") ~ "Transgender & Gender-Diverse",
        str_detect(trial_part, "sexuality") ~ "Sexuality",
        str_detect(trial_part, "race") ~ "Race/Ethnicity",
        str_detect(trial_part, "english") ~ "English Experience",
        str_detect(trial_part, "ed") ~ "Education"
      ) |>
      as.factor(),
    item =
      case_when(
        trial_part == "enter_intro_others" ~ "Intros: Others",
        trial_part == "enter_intro_self" ~ "Intros: Self",
        trial_part == "enter_nametags_others" ~ "Nametags: Others",
        trial_part == "enter_nametags_self" ~ "Nametags: Self",
        str_detect(parameter, "for myself") ~ "Myself",
        str_detect(parameter, "am close to") ~ "Close To",
        str_detect(parameter, "have met") ~ "Have Met",
        str_detect(parameter, "have not met")  ~ "Heard About",
        str_detect(parameter, "had not heard") ~ "Not Heard About",
        trial_item != "" ~ trial_item,
        str_detect(category, "Trans|Sexuality|Race") ~ parameter,
        str_detect(category, "Ed|Eng") ~ response,
        str_detect(category, "Gender") ~ category
      ) |>
      recode_factor(
        "generic" = "Generic",
        "each" = "Each",
        "every" = "Every",
        "neu" = "Neutral\nName",
        "fem" = "Fem\nName",
        "masc" = "Masc\nName"
      ) |>
      str_replace_all(c(
        "%2C" = ",", "2 year" = "2-year", "4 year" = "4-year", "term:" = "term"
      )) |>
      as.factor()
  ) |>
  select(-starts_with("trial"), -parameter)


## Response types----
d_survey$response <- d_survey$response |>
  str_replace_all(c("%2C" = ",", "2 year" = "2-year", "4 year" = "4-year"))

d_survey <- d_survey |>
  mutate(
    response_num = case_when(
      !is.na(as.numeric(response)) ~ as.numeric(response),
      is.na(as.numeric(response)) ~ NA
    ),
    response_bool = case_when(
      response == "checked" ~ TRUE,
      response == "unchecked" ~ FALSE,
      response != "checked" & response != "unchecked" ~ NA
    ),
    response_cat = case_when(
      is.na(response_num) & is.na(response_bool) ~ response,
      .default = NA
    ),
    item = case_when(
      !is.na(item) ~ item,
      category == "Age" & response_num <= 24 ~ "18–24",
      category == "Age" & response_num >= 25 & response_num <= 34 ~ "25–34",
      category == "Age" & response_num >= 35 & response_num <= 44 ~ "35–44",
      category == "Age" & response_num >= 45 & response_num <= 54 ~ "45–54",
      category == "Age" & response_num >= 55 & response_num <= 64 ~ "55–64",
      category == "Age" & response_num >= 65 & response_num <= 74 ~ "65–74",
      response_num >= 75 ~ "75+"
    )
  ) |>
  mutate(across(where(is.character), as.factor)) |>
  select(-response) |>
  filter(!is.na(item))


## Recode gender----
d_survey |>
  filter(category == "Gender") |>
  pull(response_cat) |>
  droplevels() |>
  unique()

# Group similar responses
d_survey$response_cat <- d_survey$response_cat |>
  recode_factor(
    "female" = "Woman", "f" = "Woman", "Femal" = "Woman", "Female" = "Woman",
    "FEMALE" = "Woman", "woman" = "Woman", "WOMAN" = "Woman", "Female " = "Woman",
    "female/woman" = "Woman", "Female/Woman" = "Woman", "cis woman" = "Woman",
    "Cisfemale" = "Woman", "cisgender woman" = "Woman", "transwoman" = "Woman",
    "male" = "Man", "MALE" = "Man", "Male" = "Man", "Male " = "Man", "Man" = "Man",
    "cis-gender male" = "Man", "cis male" = "Man", "TRANS MAN" = "Man",
    "Transgender Man" = "Man", "Nonbinary" = "Nonbinary spectrum",
    "nonbinary" = "Nonbinary spectrum", "Non-binary" = "Nonbinary spectrum",
    "non binary" = "Nonbinary spectrum", "Transfem nonbinary" = "Nonbinary spectrum",
    "Male and nonbinary" = "Nonbinary spectrum", "she/they" = "Nonbinary spectrum",
    "genderfluid" = "Nonbinary spectrum", "Genderfluid" = "Nonbinary spectrum",
    "questioning" = "Questioning"
  )

d_survey |>
  filter(category == "Gender") |>
  pull(response_cat) |>
  droplevels() |>
  unique()


## Write-in responses----
d_survey$item <- d_survey$item |>
  recode_factor("Final" = "I use a different term")

# Just keep one row for yes to diff term + write-in box
d_survey <- d_survey |>
  mutate(
    response_bool = case_when(
      item == "I use a different term" & !is.na(response_cat) ~ TRUE,
      .default = response_bool
    )
  ) |>
  filter(
    !(item == "I use a different term" & is.na(response_cat) & response_bool == TRUE)
  ) |>
  filter(response_cat != "Normal" | is.na(response_cat))  # asshole response that also checked straight


## Add missing data----
missing <- tibble(
  participant_id = c(
    rep("P277", 31), rep("P278", 31), rep("P419", 31),
    rep("P482", 31), rep("P502", 31)
  ),
  category = rep(
    c(
      "Age", "Education", "English Experience",
      rep("Familiarity With Pronoun-Sharing Practices", 4),
      rep("Familiarity With They/Them Pronouns", 5),
      "Gender", "Race/Ethnicity", "Sexuality",
      rep("Sentence Naturalness Ratings", 6),
      "Transgender & Gender-Diverse",
      rep("Transphobia Scale", 9)
    ),
    5
  ),
  item = rep(
    c(
      rep("Missing Data", 3),
      "Intros: Others", "Intros: Self", "Nametags: Others", "Nametags: Self",
      "Myself", "Close To", "Have Met", "Heard About", "Not Heard About",
      "Missing Data", "Missing Data", "Missing Data",
      "Masc Name", "Fem Name", "Neutral Name", "Generic", "Every", "Each",
      "Missing Data",
      paste(
        "I am uncomfortable around people who don’t conform to traditional",
        "gender roles, e.g., aggressive women or emotional men."
      ),
      "I avoid people on the street whose gender is unclear to me.",
      paste(
        "I think there is something wrong with a person who says that they",
        "are neither a man nor a woman."
      ),
      paste(
        "I would be upset if someone I’d known a long time revealed to me",
        "that they used to be another gender."
      ),
      paste(
        "When I meet someone, it is important for me to be able to identify",
        "them as a man or a woman."
      ),
      "I believe that a person can never change their gender.",
      paste(
        "A person’s genitalia define what gender they are, e.g., a penis",
        "defines a person as being a man, a vagina defines a person as being a",
        "woman."
      ),
      paste(
        "I don’t like it when someone is flirting with me, and I can’t tell",
        "if they are a man or a woman."
      ),
      "I believe that the male/female dichotomy is natural."
    ),
    5
  )
)

missing <- missing |>
  mutate(
    response_num = as.numeric(NA),
    response_bool = NA,
    response_cat = as.character(NA)
  ) |>
  mutate(across(where(is.character), as.factor))

d_survey <- bind_rows(d_survey, missing) |>
  distinct() |>
  mutate(participant_id = as.factor(as.character(participant_id)))


# Aggregates----
## Age----
agg_age <- d_survey |>
  filter(category == "Age") |>
  select(participant_id, response_num) |>
  arrange(participant_id) |>
  rename(age = response_num)


## TGD----
agg_TGD <- d_survey |>
  filter(
    category == "Transgender & Gender-Diverse" & response_bool == TRUE
  ) |>
  select(participant_id, item, response_bool) |>
  mutate(
    response_coded = case_when(
      str_detect(item, "is different") ~ 1,
      item == "I consider myself transgender" ~ 1,
      .default = 0
    )
  ) |>
  summarise(
    .by = participant_id,
    TGD = sum(response_coded) |> recode(`2` = 1)
  )


## LGBQ----
agg_LGBQ <- d_survey |>
  filter(category == "Sexuality" & response_bool == TRUE) |>
  select(participant_id, item, response_bool) |>
  mutate(response_coded = ifelse(str_detect(item, "As|Bi|Gay|Queer"), 1, 0)) |>
  summarise(
    .by = participant_id,
    LGBQ = sum(response_coded) |> recode(`2` = 1, `3` = 1)
  )


## Transphobia scale----
agg_TS <- d_survey |>
  filter(category == "Transphobia Scale") |>
  select(participant_id, response_num) |>
  mutate(response_coded = response_num - 1) |>
  summarise(.by = participant_id, gender_beliefs = sum(response_coded))


## Sentence ratings----
agg_ratings <- d_survey |>
  filter(category == "Sentence Naturalness Ratings") |>
  select(participant_id, item, response_num) |>
  mutate(
    type = ifelse(
      str_detect(item, "Name"), "rating_name", "rating_indefinite"
    )
  ) |>
  summarise(.by = c(participant_id, type), rating = mean(response_num)) |>
  pivot_wider(names_from = "type", values_from = "rating")


## Familiarity with using they/them----
agg_they <- d_survey |>
  filter(str_detect(category, "They/Them") & response_bool == TRUE) |>
  select(participant_id, item, response_bool) |>
  pivot_wider(names_from = item, values_from = response_bool) |>
  mutate(Myself_Close = ifelse(
    Myself == TRUE & `Close To` == TRUE, TRUE, NA
  )) |>
  mutate(
    .keep = c("unused"),
    familiarity = case_when(
      Myself_Close == TRUE | `Close To` == TRUE | Myself == TRUE ~ 3,
      `Have Met` == TRUE ~ 2,
      `Heard About` == TRUE | `Not Heard About` == TRUE ~ 1
    )
  )


## Familiarity with pronoun-sharing----
agg_sharing <- d_survey |>
  filter(str_detect(category, "Sharing")) |>
  select(participant_id, item, response_cat) |>
  mutate(response_coded = case_when(
    response_cat == "Always"    | response_cat == "All"   ~ 5,
    response_cat == "Usually"   | response_cat == "Most"  ~ 4,
    response_cat == "Sometimes" | response_cat == "Some"  ~ 3,
    response_cat == "Rarely"    | response_cat == "A few" ~ 2,
    str_detect(response_cat, "prefer not to") ~ 1,
    str_detect(response_cat, "not heard") ~ 0,
    response_cat == "None"                        ~ 0
  )) |>
  summarise(.by = participant_id, sharing = sum(response_coded))


## Merge---
d_agg <- participant_list |>
  select(participant_id, condition) |>
  left_join(agg_age, by = "participant_id") |>
  left_join(agg_LGBQ, by = "participant_id") |>
  left_join(agg_TGD, by = "participant_id") |>
  left_join(agg_ratings, by = "participant_id") |>
  left_join(agg_sharing, by = "participant_id") |>
  left_join(agg_they, by = "participant_id") |>
  left_join(agg_TS, by = "participant_id")


# Demographics table----
d_demographics <- d_survey |>
  filter(
    category %in% c(
      "Age", "Gender", "Transgender & Gender-Diverse", "Sexuality",
      "Race/Ethnicity", "Education", "English Experience"
    )
  ) |>
  filter(response_bool == TRUE | is.na(response_bool)) |>
  mutate(group = case_when(
    category == "Gender" ~ as.character(response_cat),
    category == "English Experience" ~ as.character(response_cat),
    category == "Education" ~ as.character(response_cat),
    category == "Sexuality" ~ as.character(item),
    category == "Race/Ethnicity" ~ as.character(item),
    category == "Transgender & Gender-Diverse" ~ as.character(item),
    category == "Age" ~ as.character(item)
  )) |>
  select(-(starts_with("response")), -item) |>
  mutate(group = group |>
    replace_na("Prefer not to answer / Missing data") |>
    recode_factor(
      "Prefer not to answer" = "Prefer not to answer / Missing data",
      "prefer not to answer" = "Prefer not to answer / Missing data",
      "Missing Data" = "Prefer not to answer / Missing data"
    )
  ) |>
  summarise(.by = c(category, group), total = n_distinct(participant_id))

dem_totals <- d_demographics |>
  group_by() |>
  summarise(.by = category, total = sum(total)) |>
  mutate(group = "Total")

d_demographics <- d_demographics |>
  bind_rows(dem_totals) |>
  arrange(category, group)


# # Export----
write_csv(d_survey, "data/exp2_survey.csv")
write_csv(d_agg, "data/exp2_participant_covariates.csv")
write_csv(d_demographics, "data/exp2_demographics.csv")

Preprocess

Survey questions parsed:

ParticipantID Condition List Category Item Response_Num Response_Bool Response_Cat
3_001 both 1 Sentence Naturalness Ratings Masc Name 1 NA NA
3_001 both 1 Sentence Naturalness Ratings Fem Name 1 NA NA
3_001 both 1 Sentence Naturalness Ratings Neutral Name 1 NA NA
3_001 both 1 Sentence Naturalness Ratings Every 5 NA NA
3_001 both 1 Sentence Naturalness Ratings Generic 1 NA NA
3_001 both 1 Sentence Naturalness Ratings Each 6 NA NA
3_001 both 1 Familiarity With They/Them Pronouns Myself NA FALSE NA
3_001 both 1 Familiarity With They/Them Pronouns Close To NA FALSE NA
3_001 both 1 Familiarity With They/Them Pronouns Have Met NA FALSE NA
3_001 both 1 Familiarity With They/Them Pronouns Heard About NA TRUE NA
3_001 both 1 Familiarity With They/Them Pronouns Not Heard About NA FALSE NA
3_001 both 1 Familiarity With Pronoun-Sharing Practices Intros: Others NA NA Some
3_001 both 1 Familiarity With Pronoun-Sharing Practices Intros: Self NA NA Never, because I prefer not to
3_001 both 1 Familiarity With Pronoun-Sharing Practices Nametags: Others NA NA Most
3_001 both 1 Familiarity With Pronoun-Sharing Practices Nametags: Self NA NA Never, because I prefer not to
3_001 both 1 Transphobia Scale I am uncomfortable around people who don't conform to traditional gender roles, e.g., aggressive women or emotional men. 2 NA NA
3_001 both 1 Transphobia Scale I avoid people on the street whose gender is unclear to me. 2 NA NA
3_001 both 1 Transphobia Scale I think there is something wrong with a person who says that they are neither a man nor a woman. 6 NA NA
3_001 both 1 Transphobia Scale I would be upset if someone I'd known a long time revealed to me that they used to be another gender. 5 NA NA
3_001 both 1 Transphobia Scale When I meet someone, it is important for me to be able to identify them as a man or a woman. 7 NA NA
3_001 both 1 Transphobia Scale I believe that a person can never change their gender. 7 NA NA
3_001 both 1 Transphobia Scale A person's genitalia define what gender they are, e.g., a penis defines a person as being a man, a vagina defines a person as being a woman. 7 NA NA
3_001 both 1 Transphobia Scale I don't like it when someone is flirting with me, and I can't tell if they are a man or a woman. 7 NA NA
3_001 both 1 Transphobia Scale I believe that the male/female dichotomy is natural. 7 NA NA
3_001 both 1 Age 45-54 53 NA NA
3_001 both 1 Gender Gender NA NA Male
3_001 both 1 Transgender & Gender-Diverse My gender is the same as what was written on my original birth certificate NA TRUE NA
3_001 both 1 Transgender & Gender-Diverse My gender is different than what was written on my original birth certificate NA FALSE NA
3_001 both 1 Transgender & Gender-Diverse I consider myself cisgender NA FALSE NA
3_001 both 1 Transgender & Gender-Diverse I consider myself transgender NA FALSE NA
3_001 both 1 Transgender & Gender-Diverse I don't consider myself cisgender or transgender NA FALSE NA
3_001 both 1 Transgender & Gender-Diverse Prefer not to answer NA FALSE NA
3_001 both 1 Sexuality Asexual NA FALSE NA
3_001 both 1 Sexuality Bisexual/Pansexual NA FALSE NA
3_001 both 1 Sexuality Gay/Lesbian NA FALSE NA
3_001 both 1 Sexuality Heterosexual/Straight NA TRUE NA
3_001 both 1 Sexuality Queer NA FALSE NA
3_001 both 1 Sexuality Questioning NA FALSE NA
3_001 both 1 Sexuality Prefer not to answer NA FALSE NA
3_001 both 1 Sexuality I use a different term NA FALSE NA
3_001 both 1 Education Professional degree NA NA Professional degree
3_001 both 1 English Experience Native (learned from birth) NA NA Native (learned from birth)
3_001 both 1 Race/Ethnicity American Indian or Alaska Native NA FALSE NA
3_001 both 1 Race/Ethnicity Asian NA FALSE NA
3_001 both 1 Race/Ethnicity Black, African American, or African NA FALSE NA
3_001 both 1 Race/Ethnicity Hispanic, Latino, or Spanish NA FALSE NA
3_001 both 1 Race/Ethnicity Middle Eastern or North African NA FALSE NA
3_001 both 1 Race/Ethnicity Native Hawaiian or Pacific Islander NA FALSE NA
3_001 both 1 Race/Ethnicity White NA TRUE NA
3_001 both 1 Race/Ethnicity Prefer not to answer NA FALSE NA
3_001 both 1 Race/Ethnicity I use a different term NA FALSE NA

Preprocess

Survey questions coded into potential covariates:

ParticipantID Condition Age LGBQ TGD Rating_Generic Rating_Name Sharing UseThey GenderBeliefs
3_001 both 53 0 0 4.000000 1.000000 9 1 41
3_002 nametag 21 1 0 6.666667 7.000000 16 3 0
3_003 nametag 43 0 0 5.666667 6.333333 4 2 12
3_004 nametag 50 0 0 5.666667 4.000000 6 1 22
3_005 intro 37 0 0 6.333333 1.333333 4 2 19
3_006 intro 35 1 0 6.000000 6.000000 17 3 0
3_007 intro 32 0 0 4.666667 5.333333 9 2 15
3_008 intro 48 0 0 5.666667 3.000000 9 2 8

Pipeline overview

Merge data

  • It’s easy to merge the participant-level survey data with the trial-level pronoun data by joining in the participant ID
  • For bigger projects, I write custom functions to load/set up data to ensure that the output is always identical

Merge data

# Loads accuracy data, sets up contrast coding and scaling----
exp3_load_data_acc <- function() {
  library(dplyr)
  library(forcats)
  library(scales)

  d <- read.csv("data/exp3_pronouns.csv", stringsAsFactors = TRUE) |>
    select(ParticipantID, Nametag, Intro, Pronoun_Pair, T_ID, Accuracy)

  # Remove trials with no pronouns
  d <- d |> filter(!is.na(Accuracy))

  # Mean-center effects code Nametag and Intro
  d$Nametag <- factor(d$Nametag, labels = c("-Nametag", "+Nametag"))
  contrasts(d$Nametag) <- cbind(c(-.5, .5))

  d$Intro <- factor(d$Intro, labels = c("-Intro", "+Intro"))
  contrasts(d$Intro) <- cbind(c(-.5, .5))

  # Orthogonal Helmert contrast codes for Pronoun Pair
  d <- d |> rename("Pronoun" = "Pronoun_Pair")
  d$Pronoun <- d$Pronoun |>
    fct_relevel("T_HS", after = 0) |>
    fct_relevel("HS_T", after = 1)
  contrasts(d$Pronoun) <- cbind(
    "Target" = c(-.66, +.33, +.33),
    "Dist"   = c(0,    -.50, +.50)
  )

  # Add dummy-coded factor for They vs He/She
  d <- d |> mutate(Pronoun_They0 = ifelse(Pronoun == "T_HS", 0, 1))

  # Dummy code Nametag and Intro
  d <- d |> mutate(
    Nametag_Yes0 = ifelse(Nametag == "+Nametag", 0, 1),
    Nametag_No0 = ifelse(Nametag == "-Nametag", 0, 1),
    Intro_Yes0 = ifelse(Intro == "+Intro", 0, 1),
    Intro_No0 = ifelse(Intro == "-Intro", 0, 1)
  )

  # Scale character (1-18)
  d <- d |> mutate(.keep = c("unused"), Character = rescale(T_ID, c(-0.5, 0.5)))

  # Subset and order
  d <- d |> select(
    ParticipantID, Nametag, Nametag_Yes0, Nametag_No0,
    Intro, Intro_Yes0, Intro_No0, Pronoun, Pronoun_They0, Character, Accuracy
  )

  return(d)
}

# Adds participant covariates to accuracy data, mean-centers + rescales them----
exp3_load_data_subj <- function() {
  # Join participant covariates to accuracy df
  d <- left_join(
      exp3_load_data_acc(),
      read.csv("data/exp3_participant-covariates.csv", stringsAsFactors = TRUE),
      by = "ParticipantID"
    ) |>
    rename("Familiarity" = "UseThey", "Rating" = "Rating_Name")

  # Remove participants with no pronouns (1) or no survey data (3)
  d <- d |> filter(!is.na(Age))
  d$ParticipantID <- droplevels(d$ParticipantID)

  # Scale THEN mean-center (on accuracy df)
  d <- d |> mutate(
    Age_C = scale(Age / 80, center = TRUE, scale = FALSE),
    Familiarity_C = scale(Familiarity / 2, center = TRUE, scale = FALSE),
    GenderBeliefs_C = scale(GenderBeliefs / 60, center = TRUE, scale = FALSE),
    LGBTQ_C = LGBQ - 0.50,
    Rating_C = scale(Rating / 6, center = TRUE, scale = FALSE),
    Sharing_C = scale(Sharing / 20, center = TRUE, scale = FALSE)
  )

  # Effects-code LGBTQ
  d <- d |> mutate(LGBTQ_Fct = as.factor(LGBTQ_C))
  contrasts(d$LGBTQ_Fct) <- cbind(c(-0.5, +0.5))

  # Subset and order
  d <- d |> select(
    ParticipantID, Nametag, Intro, Pronoun, Character, Accuracy,
    Age, Age_C, Familiarity, Familiarity_C,
    GenderBeliefs, GenderBeliefs_C, LGBTQ_C, LGBTQ_Fct,
    Rating, Rating_C, Sharing, Sharing_C
  )

  return(d)
}

Estimate internal reliability

  • Before we can/should use the survey questions as predictors of production accuracy, we need to establish the internal reliability (Hedge, Powell, and Sumner 2017)
  • Used a Bayesian mixed-effects model approach comparing the by-participant slopes in each half of the data (Staub 2021)
  • Estimates of the relative accuracy of they/them compared to he/him + she/her for each participant were strongly correlated between halves of the data, r = 0.97 [0.90, 1.00]

Estimate internal reliability

Fit using the {brms} package (Bürkner 2017) in R:

# split into haves and create Pronoun effect vars for each half
exp3_d_reliability <- exp3_load_data_acc() |>
  select(ParticipantID, Pronoun, Accuracy) |>
  arrange(ParticipantID, Pronoun) |>  # sort by pronoun within participant
  mutate(Obs_Num = seq(1, length(Pronoun)))  |>
  mutate(Obs_Half = case_when(  # count odd and even trials
    is_even(Obs_Num) ~ "even",
    is_odd(Obs_Num)  ~ "odd"
  )) |>
  mutate(
    Pronoun_Even = case_when(  # effect of pronoun just in even trials
      Obs_Half == "even" & Pronoun == "T_HS" ~ -0.66,
      Obs_Half == "even" & Pronoun != "T_HS" ~ +0.33,
      Obs_Half == "odd" ~ 0
    ),
    Pronoun_Odd = case_when(  # effect of pronoun just in odd trials
      Obs_Half == "odd" & Pronoun == "T_HS" ~ -0.66,
      Obs_Half == "odd" & Pronoun != "T_HS" ~ +0.33,
      Obs_Half == "even" ~ 0
  ))

# run Bayesian model
exp3_m_reliability <- brm(
  formula = Accuracy ~ Pronoun_Even + Pronoun_Odd +  # fixed effects for halves
    (1 + Pronoun_Even + Pronoun_Odd | ParticipantID),  # random slopes by subj
  data = exp3_d_reliability,
  family = bernoulli(),  # keep default priors
  seed = 4, cores = 4,
  chains = 4, iter = 4000,
  file = "r_data/exp3_reliability"  # won't rerun because results are copied in
)
exp3_m_reliability

# tidy results
exp3_r_reliability <- exp3_m_reliability |>
  tidy() |>
  filter(str_detect(term, "Even") & str_detect(term, "Odd")) |>
  select(estimate, std.error, conf.low, conf.high) |>
  mutate(across(everything(), ~format(., digits = 2, nsmall = 2)))

exp3_r_reliability

Multilevel model

  • The data is nested within participants and items, so we fit a mixed-effects model with crossed random effects.
  • Pronoun is coded with Orthogonal Helmert contrasts (1st contrast compares they/them to he/him + she/her; 2nd contrast not relevant here). Nametag and Introduction are mean-center effects coded. Demographic/survey variables are all mean-centered.
  • The maximal model justified by the experimental design: Accuracy ~ Pronoun * Nametag * Intro + (Pronoun | ParticipantID) + (1 | Character) (Baayen, Davidson, and Bates 2008; Barr et al. 2013).
  • The maximal model that converged only included random intercepts.
  • Stepwise regression to test if adding the demographic, language experience, and language attitude variables significantly improves model fit above the hypothesis-testing model.

Multilevel model

  • Using {lme4} for logistic mixed-effects regression modeling (Bates et al. 2015) and {buildmer} for stepwise model comparison (Voeten 2023)
  • Can run this locally, or on an Amazon EC2 instance to save time using {paws}
# Run in parallel with 6 clusters
# Won't work when running as background job, but otherwise much faster
cl6 <- makeCluster(6)  # make 6 clusters, keep default type
clusterEvalQ(cl6, "buildmer")  # check all packages are loaded to each cluster
clusterExport(cl6, "exp3_d_subjCov")  # check data is loaded to each cluster

exp3_m_subj_cov <- buildmer(
  formula = Accuracy ~ Pronoun * Nametag * Intro *  # allow all interactions
            Age_C * Familiarity_C * GenderBeliefs_C * LGBTQ_Fct +
            Rating_C * Sharing_C +
            (1 | ParticipantID) + (1 | Character),
  data = exp3_d_subjCov,
  family = binomial,
  buildmerControl = list(
    direction = c("order", "backward"),  # max then backwards elim (default)
    cl = cl6,
    args = list(glmerControl(optimizer = "bobyqa")),  # nlminbwrap had huge SE
    # require Pronoun * Nametag * Intro and both random intercepts
    # aka keep hypothesis testing model
    include =
      "Pronoun * Nametag * Intro + (1 | ParticipantID) + (1 | Character)"
    )
  )
stopCluster(cl6)
remove(cl6)

Model results

  • Intercept: More likely to produce the correct pronoun than not across all conditions (β = 13.16, z = 12.24, p < .001)
  • Pronoun: More accurate for he/him + she/her characters than for they/them characters (β = 5.05, z = 5.09, p < .001)
  • Pronoun × Nametag × Intro (β = 6.24, z = 3.91, p < .001)
    • –Nametag +Intro condition showed 95% accuracy for singular they
    • +Nametag +Intro and +Nametag –Intro conditions showed 91% accuracy
    • –Nametag –Intro condition showed 73% accuracy
  • Gender Beliefs: Participants who more strongly endorsed the gender binary and gender essentialism were less accurate overall (β = -10.07, z = -3.44, p < .001) and showed a larger relative difference in accuracy between they/them and he/him + she/her (β = 6.50, z = 3.53, p < .001)

Bonus Slides: Study Results & Materials

Demographics Questions

Age: ___________


What is your gender? ___________

(select all that apply)

[ ] My gender is the same what was written on my original birth certificate

[ ] My gender is different than what was written on my original birth certificate

[ ] I consider myself cisgender

[ ] I consider myself transgender

[ ] I don’t consider myself cisgender or transgender

[ ] Prefer not to answer


How do you describe your sexuality? (select all that apply)

[ ] Asexual

[ ] Bisexual/Pansexual

[ ] Gay/Lesbian

[ ] Heterosexual/Straight

[ ] Queer

[ ] Questioning

[ ] Prefer not to answer

[ ] I use a different term: ___________


How do you describe your race/ethnicity?: (select all that apply)

[ ] American Indian or Alaska Native

[ ] Asian

[ ] Black, African American, or African

[ ] Hispanic, Latino, or Spanish

[ ] Middle Eastern or North African

[ ] Native Hawaiian or Pacific Islander

[ ] White

[ ] Prefer not to answer

[ ] I use a different term: ___________


What is your highest education level?

  • Less than high school
  • High school graduate
  • Some college
  • 2-year degree
  • 4-year degree
  • Professional degree
  • Doctorate
  • Prefer not to answer


Please rate your overall ability in the English language:

  • Native (learned from birth)
  • Fully competent in speaking, listening, reading, and writing, but not native
  • Limited but adequate competence in speaking, reading, and writing
  • Restricted ability (e.g., only reading or speaking/listening)
  • Some familiarity (e.g., a year of instruction in school)
  • Prefer not to answer

References

Baayen, R. H., D. J. Davidson, and D. M. Bates. 2008. “Mixed-Effects Modeling with Crossed Random Effects for Subjects and Items.” Journal of Memory and Language 59 (4): 390–412. https://doi.org/10.1016/j.jml.2007.12.005.
Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78. https://doi.org/10.1016/j.jml.2012.11.001.
Bates, D. M., Martin Mächler, Benjamin M. Bolker, and Steven C. Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Bürkner, Paul-Christian. 2017. “Brms: An r Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
Green, P., and C. J. MacLeod. 2016. simr: An R Package for Power Analysis of Generalized Linear Mixed Models by Simulation.” Methods in Ecology and Evolution 7 (4): 493–98. https://doi.org/10.1111/2041-210X.12504.
Hedge, Craig, Georgina Powell, and Petroc Sumner. 2017. “The Reliability Paradox: Why Robust Cognitive Tasks Do Not Produce Reliable Individual Differences.” Behavior Research Methods 50 (3): 1166–86. https://doi.org/10.3758/s13428-017-0935-1.
Radford, Alec, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. “Robust Speech Recognition via Large-Scale Weak Supervision.” https://doi.org/10.48550/ARXIV.2212.04356.
Staub, Adrian. 2021. “How Reliable Are Individual Differences in Eye Movements in Reading?” Journal of Memory and Language 116: 104190. https://doi.org/10.1016/j.jml.2020.104190.
Voeten, Cesko C. 2023. Buildmer: Stepwise Elimination and Term Reordering for Mixed-Effects Regression. https://CRAN.R-project.org/package=buildmer.